Efficient Dimensionality Reduction for High-Dimensional Network Estimation
نویسندگان
چکیده
We propose module graphical lasso (MGL), an aggressive dimensionality reduction and network estimation technique for a highdimensional Gaussian graphical model (GGM). MGL achieves scalability, interpretability and robustness by exploiting the modularity property of many real-world networks. Variables are organized into tightly coupled modules and a graph structure is estimated to determine the conditional independencies among modules. MGL iteratively learns the module assignment of variables, the latent variables, each corresponding to a module, and the parameters of the GGM of the latent variables. In synthetic data experiments, MGL outperforms the standard graphical lasso and three other methods that incorporate latent variables into GGMs. When applied to gene expression data from ovarian cancer, MGL outperforms standard clustering algorithms in identifying functionally coherent gene sets and predicting survival time of patients. The learned modules and their dependencies provide novel insights into cancer biology as well as identifying possible novel drug targets.
منابع مشابه
انجام یک مرحله پیش پردازش قبل از مرحله استخراج ویژگی در طبقه بندی داده های تصاویر ابر طیفی
Hyperspectral data potentially contain more information than multispectral data because of their higher spectral resolution. However, the stochastic data analysis approaches that have been successfully applied to multispectral data are not as effective for hyperspectral data as well. Various investigations indicate that the key problem that causes poor performance in the stochastic approaches t...
متن کاملA Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters
Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...
متن کاملMultivariate time-series analysis and diffusion maps
Dimensionality reduction in multivariate time series has broad applications, ranging from financial-data analysis to biomedical research. However, high levels of ambient noise and various interferences result in nonstationary signals, which may lead to inefficient performance of conventional methods. In this paper, we propose a nonlinear dimensionality reduction framework using diffusion maps o...
متن کاملHead Pose Estimation via Manifold Learning
For the last decades, manifold learning has shown its advantage of efficient non-linear dimensionality reduction in data analysis. Based on the assumption that informative and discriminative representation of the data lies on a low-dimensional smooth manifold which implicitly embedded in the original high-dimensional space, manifold learning aims to learn the low-dimensional representation foll...
متن کاملComparison of MLP NN Approach with PCA and ICA for Extraction of Hidden Regulatory Signals in Biological Networks
The biologists now face with the masses of high dimensional datasets generated from various high-throughput technologies, which are outputs of complex inter-connected biological networks at different levels driven by a number of hidden regulatory signals. So far, many computational and statistical methods such as PCA and ICA have been employed for computing low-dimensional or hidden represe...
متن کامل